romanized sinhala
Swa-bhasha Resource Hub: Romanized Sinhala to Sinhala Transliteration Systems and Data Resources
Sumanathilaka, Deshan, Perera, Sameera, Dharmasiri, Sachithya, Athukorala, Maneesha, Herath, Anuja Dilrukshi, Dias, Rukshan, Gamage, Pasindu, Weerasinghe, Ruvan, Priyadarshana, Y. H. P. P.
The Swa-bhasha Resource Hub provides a comprehensive collection of data resources and algorithms developed for Romanized Sinhala to Sinhala transliteration between 2020 and 2025. These resources have played a significant role in advancing research in Sinhala Natural Language Processing (NLP), particularly in training transliteration models and developing applications involving Romanized Sinhala. The current openly accessible data sets and corresponding tools are made publicly available through this hub. This paper presents a detailed overview of the resources contributed by the authors and includes a comparative analysis of existing transliteration applications in the domain.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Europe > United Kingdom > Wales (0.04)
- Asia > Sri Lanka > Western Province > Colombo > Colombo (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Research Report (0.64)
- Overview (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.53)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Stylomech: Unveiling Authorship via Computational Stylometry in English and Romanized Sinhala
Faumi, Nabeelah, Gunathilake, Adeepa, Wickramanayake, Benura, Dias, Deelaka, Sumanathilaka, TGDK
With the advent of Web 2.0, the development in social technology coupled with global communication systematically brought positive and negative impacts to society. Copyright claims and Author identification are deemed crucial as there has been a considerable amount of increase in content violation owing to the lack of proper ethics in society. The Author's attribution in both English and Romanized Sinhala became a major requirement in the last few decades. As an area largely unexplored, particularly within the context of Romanized Sinhala, the research contributes significantly to the field of computational linguistics. The proposed author attribution system offers a unique approach, allowing for the comparison of only two sets of text: suspect author and anonymous text, a departure from traditional methodologies which often rely on larger corpora. This work focuses on using the numerical representation of various pairs of the same and different authors allowing for, the model to train on these representations as opposed to text, this allows for it to apply to a multitude of authors and contexts, given that the suspected author text, and the anonymous text are of reasonable quality. By expanding the scope of authorship attribution to encompass diverse linguistic contexts, the work contributes to fostering trust and accountability in digital communication, especially in Sri Lanka. This research presents a pioneering approach to author attribution in both English and Romanized Sinhala, addressing a critical need for content verification and intellectual property rights enforcement in the digital age.
- Asia > Sri Lanka (0.25)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- North America > United States > California > Orange County > Anaheim (0.04)
- (2 more...)
- Research Report > Promising Solution (0.34)
- Overview > Innovation (0.34)
IndoNLP 2025: Shared Task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages
Sumanathilaka, Deshan, Anuradha, Isuri, Weerasinghe, Ruvan, Micallef, Nicholas, Hough, Julian
The paper overviews the shared task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages. It focuses on the reverse transliteration of low-resourced languages in the Indo-Aryan family to their native scripts. Typing Romanized Indo-Aryan languages using ad-hoc transliterals and achieving accurate native scripts are complex and often inaccurate processes with the current keyboard systems. This task aims to introduce and evaluate a real-time reverse transliterator that converts Romanized Indo-Aryan languages to their native scripts, improving the typing experience for users. Out of 11 registered teams, four teams participated in the final evaluation phase with transliteration models for Sinhala, Hindi and Malayalam. These proposed solutions not only solve the issue of ad-hoc transliteration but also empower low-resource language usability in the digital arena.
- Asia > Sri Lanka > Western Province > Colombo > Colombo (0.04)
- Asia > Pakistan (0.04)
- Asia > Nepal (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- (2 more...)
EmoScan: Automatic Screening of Depression Symptoms in Romanized Sinhala Tweets
Hewapathirana, Jayathi, Sumanathilaka, Deshan
This work explores the utilization of Romanized Sinhala social media data to identify individuals at risk of depression. A machine learning-based framework is presented for the automatic screening of depression symptoms by analyzing language patterns, sentiment, and behavioural cues within a comprehensive dataset of social media posts. The research has been carried out to compare the suitability of Neural Networks over the classical machine learning techniques. The proposed Neural Network with an attention layer which is capable of handling long sequence data, attains a remarkable accuracy of 93.25% in detecting depression symptoms, surpassing current state-of-the-art methods. These findings underscore the efficacy of this approach in pinpointing individuals in need of proactive interventions and support. Mental health professionals, policymakers, and social media companies can gain valuable insights through the proposed model. Leveraging natural language processing techniques and machine learning algorithms, this work offers a promising pathway for mental health screening in the digital era. By harnessing the potential of social media data, the framework introduces a proactive method for recognizing and assisting individuals at risk of depression. In conclusion, this research contributes to the advancement of proactive interventions and support systems for mental health, thereby influencing both research and practical applications in the field.